Faster Clustering via Preprocessing

نویسندگان

  • Tsvi Kopelowitz
  • Robert Krauthgamer
چکیده

We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of points M ; the next stage receives as input a query set Q ⊂ M , and should report a clustering of Q according to some objective, such as 1-median, in which case the answer is a point a ∈ M minimizing ∑ q∈Q dM (a, q). We design fast algorithms that approximately solve such problems under standard clustering objectives like p-center and p-median, when the metric M has low doubling dimension. By leveraging the preprocessing stage, our algorithms achieve query time that is near-linear in the query size n = |Q|, and is (almost) independent of the total number of points m = |M |. This work was supported in part by The Israel Science Foundation (grant #452/08), by a US-Israel BSF grant #2010418, and by the Citi Foundation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Preprocessing via Polynomial Fitting to Improve Clustering of Spoken Arabic Digits

Advancements in speech recognition and voice-to-text technologies have made these topics more popular as of late. When analyzing certain types of time-series data, it is important to properly preprocess the data in order to deal with varied-length instances. In this work we study the influence of three different data preprocessing techniques on the quality of patterns extracted from a dataset o...

متن کامل

Fast and Intuitive Clustering of Web Documents

Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in response to user queries. Recently, document clustering has been put forth as an alternative method of organizing the results of a retrieval 6]. A person browsing the clusters can discover patterns that would be overlooked in the traditional ranked-list presentation. In this context, a document c...

متن کامل

CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC

In recent years, stochastic gradient Markov Chain Monte Carlo (SG-MCMC) methods have been raised to process large-scale dataset by iterative learning from small minibatches. However, the high variance caused by naive subsampling usually slows down the convergence to the desired posterior distribution. In this paper, we propose an effective subsampling strategy to reduce the variance based on a ...

متن کامل

A Coarse to Fine Minutiae-Based Latent Palm print Matching and fusion

In this paper, a coarse to fine matching strategy based on minutiae clustering and minutiae match propagation is designed specifically for palmprint matching and to deal with the large database and local feature-based minutiae clustering algorithm is designed to cluster minutiae into several groups such that minutiae belonging to the same group have similar local characteristics. The proposed p...

متن کامل

On the need of unfolding preprocessing for time series clustering

Clustering methods are commonly used on time series, either as a preprocessing for other methods or for themselves. This paper illustrates the problem of clustering applied on regressor vectors obtained from row time series. It is thus shown why time series clustering may sometimes seem meaningless. A preprocessing is proposed to unfold time series and allow a meaningful clustering of regressor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1208.5247  شماره 

صفحات  -

تاریخ انتشار 2012